246 research outputs found
Energy-Based Hindsight Experience Prioritization
In Hindsight Experience Replay (HER), a reinforcement learning agent is
trained by treating whatever it has achieved as virtual goals. However, in
previous work, the experience was replayed at random, without considering which
episode might be the most valuable for learning. In this paper, we develop an
energy-based framework for prioritizing hindsight experience in robotic
manipulation tasks. Our approach is inspired by the work-energy principle in
physics. We define a trajectory energy function as the sum of the transition
energy of the target object over the trajectory. We hypothesize that replaying
episodes that have high trajectory energy is more effective for reinforcement
learning in robotics. To verify our hypothesis, we designed a framework for
hindsight experience prioritization based on the trajectory energy of goal
states. The trajectory energy function takes the potential, kinetic, and
rotational energy into consideration. We evaluate our Energy-Based
Prioritization (EBP) approach on four challenging robotic manipulation tasks in
simulation. Our empirical results show that our proposed method surpasses
state-of-the-art approaches in terms of both performance and sample-efficiency
on all four tasks, without increasing computational time. A video showing
experimental results is available at https://youtu.be/jtsF2tTeUGQComment: Published in Conference on Robot Learning (CoRL 2018) as oral
presentation (7%), Zurich, Switzerlan
Tensor Decompositions for Modeling Inverse Dynamics
Modeling inverse dynamics is crucial for accurate feedforward robot control.
The model computes the necessary joint torques, to perform a desired movement.
The highly non-linear inverse function of the dynamical system can be
approximated using regression techniques. We propose as regression method a
tensor decomposition model that exploits the inherent three-way interaction of
positions x velocities x accelerations. Most work in tensor factorization has
addressed the decomposition of dense tensors. In this paper, we build upon the
decomposition of sparse tensors, with only small amounts of nonzero entries.
The decomposition of sparse tensors has successfully been used in relational
learning, e.g., the modeling of large knowledge graphs. Recently, the approach
has been extended to multi-class classification with discrete input variables.
Representing the data in high dimensional sparse tensors enables the
approximation of complex highly non-linear functions. In this paper we show how
the decomposition of sparse tensors can be applied to regression problems.
Furthermore, we extend the method to continuous inputs, by learning a mapping
from the continuous inputs to the latent representations of the tensor
decomposition, using basis functions. We evaluate our proposed model on a
dataset with trajectories from a seven degrees of freedom SARCOS robot arm. Our
experimental results show superior performance of the proposed functional
tensor model, compared to challenging state-of-the art methods
Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient
Learning goal-oriented dialogues by means of deep reinforcement learning has
recently become a popular research topic. However, commonly used policy-based
dialogue agents often end up focusing on simple utterances and suboptimal
policies. To mitigate this problem, we propose a class of novel
temperature-based extensions for policy gradient methods, which are referred to
as Tempered Policy Gradients (TPGs). On a recent AI-testbed, i.e., the
GuessWhat?! game, we achieve significant improvements with two innovations. The
first one is an extension of the state-of-the-art solutions with Seq2Seq and
Memory Network structures that leads to an improvement of 7%. The second one is
the application of our newly developed TPG methods, which improves the
performance additionally by around 5% and, even more importantly, helps produce
more convincing utterances.Comment: Published in IEEE Spoken Language Technology (SLT 2018), Athens,
Greec
Curiosity-Driven Experience Prioritization via Density Estimation
In Reinforcement Learning (RL), an agent explores the environment and
collects trajectories into the memory buffer for later learning. However, the
collected trajectories can easily be imbalanced with respect to the achieved
goal states. The problem of learning from imbalanced data is a well-known
problem in supervised learning, but has not yet been thoroughly researched in
RL. To address this problem, we propose a novel Curiosity-Driven Prioritization
(CDP) framework to encourage the agent to over-sample those trajectories that
have rare achieved goal states. The CDP framework mimics the human learning
process and focuses more on relatively uncommon events. We evaluate our methods
using the robotic environment provided by OpenAI Gym. The environment contains
six robot manipulation tasks. In our experiments, we combined CDP with Deep
Deterministic Policy Gradient (DDPG) with or without Hindsight Experience
Replay (HER). The experimental results show that CDP improves both performance
and sample-efficiency of reinforcement learning agents, compared to
state-of-the-art methods.Comment: Accepted by NIPS Deep RL Workshop, 2018, link:
https://sites.google.com/view/deep-rl-workshop-nips-2018 . arXiv admin note:
substantial text overlap with arXiv:1810.01363 and text overlap with
arXiv:1905.0878
Efficient Dialog Policy Learning via Positive Memory Retention
This paper is concerned with the training of recurrent neural networks as
goal-oriented dialog agents using reinforcement learning. Training such agents
with policy gradients typically requires a large amount of samples. However,
the collection of the required data in form of conversations between chat-bots
and human agents is time-consuming and expensive. To mitigate this problem, we
describe an efficient policy gradient method using positive memory retention,
which significantly increases the sample-efficiency. We show that our method is
10 times more sample-efficient than policy gradients in extensive experiments
on a new synthetic number guessing game. Moreover, in a real-word visual object
discovery game, the proposed method is twice as sample-efficient as policy
gradients and shows state-of-the-art performance.Comment: Published in IEEE Spoken Language Technology (SLT 2018), Athens,
Greec
Logistic Tensor Factorization for Multi-Relational Data
Tensor factorizations have become increasingly popular approaches for various
learning tasks on structured data. In this work, we extend the RESCAL tensor
factorization, which has shown state-of-the-art results for multi-relational
learning, to account for the binary nature of adjacency tensors. We study the
improvements that can be gained via this approach on various benchmark datasets
and show that the logistic extension can improve the prediction results
significantly.Comment: Accepted at ICML 2013 Workshop "Structured Learning: Inferring Graphs
from Structured and Unstructured Inputs" (SLG 2013
The Tensor Memory Hypothesis
We discuss memory models which are based on tensor decompositions using
latent representations of entities and events. We show how episodic memory and
semantic memory can be realized and discuss how new memory traces can be
generated from sensory input: Existing memories are the basis for perception
and new memories are generated via perception. We relate our mathematical
approach to the hippocampal memory indexing theory. We describe the first
detailed mathematical models for the complete processing pipeline from sensory
input and its semantic decoding, i.e., perception, to the formation of episodic
and semantic memories and their declarative semantic decodings. Our main
hypothesis is that perception includes an active semantic decoding process,
which relies on latent representations of entities and predicates, and that
episodic and semantic memories depend on the same decoding process. We
contribute to the debate between the leading memory consolidation theories,
i.e., the standard consolidation theory (SCT) and the multiple trace theory
(MTT). The latter is closely related to the complementary learning systems
(CLS) framework. In particular, we show explicitly how episodic memory can
teach the neocortex to form a semantic memory, which is a core issue in MTT and
CLS.Comment: Presented at MLINI-2016 workshop, 2016 (arXiv:1701.01437) Report-no:
MLINI/2016/0
Improving Information Extraction from Images with Learned Semantic Models
Many applications require an understanding of an image that goes beyond the
simple detection and classification of its objects. In particular, a great deal
of semantic information is carried in the relationships between objects. We
have previously shown that the combination of a visual model and a statistical
semantic prior model can improve on the task of mapping images to their
associated scene description. In this paper, we review the model and compare it
to a novel conditional multi-way model for visual relationship detection, which
does not include an explicitly trained visual prior model. We also discuss
potential relationships between the proposed methods and memory models of the
human brain
Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks
With the rising number of interconnected devices and sensors, modeling
distributed sensor networks is of increasing interest. Recurrent neural
networks (RNN) are considered particularly well suited for modeling sensory and
streaming data. When predicting future behavior, incorporating information from
neighboring sensor stations is often beneficial. We propose a new RNN based
architecture for context specific information fusion across multiple spatially
distributed sensor stations. Hereby, latent representations of multiple local
models, each modeling one sensor station, are jointed and weighted, according
to their importance for the prediction. The particular importance is assessed
depending on the current context using a separate attention function. We
demonstrate the effectiveness of our model on three different real-world sensor
network datasets
Tensor-Train Recurrent Neural Networks for Video Classification
The Recurrent Neural Networks and their variants have shown promising
performances in sequence modeling tasks such as Natural Language Processing.
These models, however, turn out to be impractical and difficult to train when
exposed to very high-dimensional inputs due to the large input-to-hidden weight
matrix. This may have prevented RNNs' large-scale application in tasks that
involve very high input dimensions such as video modeling; current approaches
reduce the input dimensions using various feature extractors. To address this
challenge, we propose a new, more general and efficient approach by factorizing
the input-to-hidden weight matrix using Tensor-Train decomposition which is
trained simultaneously with the weights themselves. We test our model on
classification tasks using multiple real-world video datasets and achieve
competitive performances with state-of-the-art models, even though our model
architecture is orders of magnitude less complex. We believe that the proposed
approach provides a novel and fundamental building block for modeling
high-dimensional sequential data with RNN architectures and opens up many
possibilities to transfer the expressive and advanced architectures from other
domains such as NLP to modeling high-dimensional sequential data
- …